-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added support for Claude 3+ Chat API in Bedrock #2870
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR @RyanKadri! A few tiny suggestions but you also have merge conflicts so I didn't do a thorough review yet.
const chatCompletionMessage = new LlmChatCompletionMessage({ | ||
agent, | ||
segment, | ||
bedrockCommand, | ||
bedrockResponse, | ||
isResponse: true, | ||
index: index + 1, | ||
index: index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index: index, | |
index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little confusing. what about splitting vars so the prompt index can be promptIndex
and then in here you have completionIndex
, then this index is promptIndex
+ completionIndex
please run |
Sounds good 👍. I have a couple other changes incoming for another unit test scenario I want to cover |
23c37dc
to
c31dace
Compare
(Looking into unit tests) |
@RyanKadri is this ready for another review? You have merge conflicts again |
Sorry. I got caught up in a couple other tasks. Still taking a look at the tests. I'll fix the merge conflicts as well |
c31dace
to
b6b59d0
Compare
embeddings.forEach(embedding => { | ||
recordEvent({ agent, type: 'LlmEmbedding', msg: embedding }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do a bit more research on this but I think some of the Bedrock embedding models allow you to make a single invoke call that generates several embeddings (see this Cohere blog for an example). So I think it might be correct to allow unrolling one embedding command to several embedding events. Currently all the embedding models in the command class produce a single prompt but I'm wondering if the Cohere one is also incorrectly squashing messages in some cases.
I might try to treat that as a separate PR / task if you're alright with that though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, how do you want to handle an error? I assume we still want one error attached to the transaction? I opted to only attach the embedding info if there's one event to keep the current behavior but I'm not sure that's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this enough, but seems ok for now
@bizob2828 I think I'm good for another review if you have a sec. I see some versioned tests are failing but they're complaining about an s3 client version and I don't think I touched anything that should break that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, just a few questions around some testing gaps
@@ -36,7 +36,7 @@ class LlmChatCompletionSummary extends LlmEvent { | |||
const cmd = this.bedrockCommand | |||
this[cfr] = this.bedrockResponse.finishReason | |||
this[rt] = cmd.temperature | |||
this[nm] = 1 + this.bedrockResponse.completions.length | |||
this[nm] = (this.bedrockCommand.prompt?.length ?? 0) + this.bedrockResponse.completions.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the bedrockCommand
file this is always an array. so if the length is 0 it'll be 0, do we need this defaulting to 0 when prompt.length is falsey?
embeddings.forEach(embedding => { | ||
recordEvent({ agent, type: 'LlmEmbedding', msg: embedding }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this enough, but seems ok for now
@@ -63,7 +65,7 @@ class BedrockResponse { | |||
// Streamed response | |||
this.#completions = body.completions | |||
} else { | |||
this.#completions = body?.content?.map((c) => c.text) | |||
this.#completions = [stringifyClaudeChunkedMessage(body?.content)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see a versioned test but not a unit test for this
result = collected.join(' ') | ||
return [{ role: 'user', content: this.#body.prompt }] | ||
} else if (this.isClaudeMessagesApi() === true) { | ||
return normalizeClaude3Messages(this.#body?.messages ?? []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a unit nor versioned tests that asserts that this.#body.messages is falsey and defaults to empty array, is this needed? if so, please write a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to handle the customer passing an invalid body. I realized the default value isn't needed here because I have it handled below anyway. I added a unit test for this case anyway to catch where it's handled lower down
I reviewed it. Yea, that s3 failure seems like an issue with npm and installing, it passes locally for me, if it's still failing after i re-ran, i can help dig |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2870 +/- ##
==========================================
- Coverage 97.39% 97.34% -0.06%
==========================================
Files 308 310 +2
Lines 47330 47512 +182
==========================================
+ Hits 46099 46252 +153
- Misses 1231 1260 +29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
*/ | ||
function normalizeClaude3Messages(messages) { | ||
const result = [] | ||
for (const message of messages ?? []) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm going to approve this but we can follow up on the missing coverage here: messages
is falsey and there is an array of messages but one is null
Description
The current version of the Bedrock instrumentation is mostly geared toward supporting the Claude Text Completions API but may not split messages as expected in multi-turn conversations with the Messages API.
The Text Completions API supports a single prompt message but the Messages API, lets you provide multiple distinct messages (context and prompt) to the model. Currently the agent combines those separate context messages into a single prompt but it probably should follow the pattern we use in OpenAI instrumentation of splitting them into multiple
LlmChatCompletionMessage
events (still oneLlmChatCompletionSummary
per API call)To make things more complicated, the Messages API also allows messages to have sub-chunks to support multi-modal use-cases (Anthropic docs for how this works). For example, one conceptual multi-modal message might be
What is this animal <picture of an elephant> and where does it live?
. For these cases, if a single message has multiple chunks, this PR attempts to join those together into one string representation with reasonable placeholders for non-text components. Further down the road, the AI Monitoring team needs to do a bit of work to see if we can represent those non-text chunks as placeholders in a string. For now though, we just want something to show.How to Test
The AI Monitoring sample app is running this version of the instrumentation for Anthropic models. The AI Monitoring team can help you out with where to see the data it produces
Related Issues
N/A